Subband Selection for Binaural Speech Source Localization
نویسندگان
چکیده
We consider the task of speech source localization using binaural cues, namely interaural time and level difference (ITD & ILD). A typical approach is to process binaural speech using gammatone filters and calculate frame-level ITD and ILD in each subband. The ITD, ILD and their combination (ITLD) in each subband are statistically modelled using Gaussian mixture models for every direction during training. Given a binaural test-speech, the source is localized using maximum likelihood criterion assuming that the binaural cues in each subband are independent. We, in this work, investigate the robustness of each subband for localization and compare their performance against the full-band scheme with 32 gammatone filters. We propose a subband selection procedure using the training data where subbands are rank ordered based on their localization performance. Experiments on Subject 003 from the CIPIC database reveal that, for high SNRs, the ITD and ITLD of just one subband centered at 296Hz is sufficient to yield localization accuracy identical to that of the full-band scheme with a test-speech of duration 1sec. At low SNRs, in case of ITD, the selected subbands are found to perform better than the full-band scheme.
منابع مشابه
Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression
The precedence effect describes the ability of the auditory system to suppress the later-arriving components of sound in a reverberant environment, maintaining the perceived arrival azimuth of a sound in the direction of the actual source, even though later reverberant components may arrive from other directions. It is also widely believed that precedence-like processing can also improve speech...
متن کاملNon-negative matrix factorization-based subband decomposition for acoustic source localization
A novel Non-negative Matrix Factorization (NMF) based subband decomposition in frequency-spatial domain for acoustic source localization using a microphone array. The proposed method decomposes source and noise subband and emphasizes source dominant frequency bins for more accurate source representation. By employing NMF, we extract Delay Basis Vectors (DBV) and their subband information in fre...
متن کاملمدل میکروسکوپی دوگوشی مبتنی بر فیلتر بانک مدولاسیون برای پیش گویی قابلیت فهم گفتار در افراد دارای شنوایی عادی
In this study, a binaural microscopic model for the prediction of speech intelligibility based on the modulation filter bank is introduced. So far, the spectral criteria such as the STI and SII or other analytical methods have been used in the binaural models to determine the binaural intelligibility. In the proposed model, unlike all models of binaural intelligibility prediction, an automatic ...
متن کاملA binaural speech processing method using subband-cross correlation analysis for noise robust recognition
This paper describes an extended subband-crosscorrelation(SBXCOR) analysis to improve the robustness against noise. The SBXCOR analysis, which has been already proposed, is a binaural speech processing technique using two input signals and extracts the periodicities associated with the inverse of the center frequency(CF) in each subband. In this paper, by taking an exponentially weighted sum of...
متن کاملA Latently Constrained Mixture Model for Audio Source Separation and Localization
We present a method for audio source separation and localization from binaural recordings. The method combines a new generative probabilistic model with time-frequency masking. We suggest that device-dependent relationships between point-source positions and interaural spectral cues may be learnt in order to constrain a mixture model. This allows to capture subtle separation and localization fe...
متن کامل